Parameter Space Compression Underlies Emergent Theories and Predictive Models 
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We report a similarity between the microscopic parameter dependance of emergent theories in 
physics and that of multiparameter models common in other areas of science. In both cases, predic- 
tions are possible despite large uncertainties in the microscopic parameters because these details are 
compressed into just a few governing parameters that are sufficient to describe relevant observables. 
We make this commonality explicit by examining parameter sensitivity in a hopping model of diffu- 
sion and a generalized Ising model of ferromagnetism. We trace the emergence of a smaller effective 
model to the development of a hierarchy of parameter importance quantified by the eigenvalues of 
the Fisher Information Matrix. Strikingly, the same hierarchy appears ubiquitously in models taken 
from diverse areas of science. We conclude that the emergence of effective continuum and universal 
theories in physics is due to the same parameter space hierarchy that underlies predictive modeling 
in other areas of science. 



The success of science, and the comprehensibility of 
nature owes in large part to the hierarchical character of 
scientific theories [TJ [5] . These theories of our physical 
world, ranging in scales from the sub-atomic to the as- 
tronomical, model natural phenomena as if physics at 
macroscopic length scales were almost independent of 
the underlying, shorter length scale details. For exam- 
ple, understanding string theory or some other funda- 
mental high energy theory is not necessary for quantita- 
tively modeling the behavior of superconductors that op- 
erate in a lower energy regime. The fact that many lower 
level theories in physics can be systematically coarsened 
(renormalized) into macroscopic effective models, estab- 
lishes and quantifies their hierarchical character. More- 
over, experience suggests that a similar hierarchy of the- 
ories is also at play in multiparameter models in other 
areas of science even though a similarly systematic coars- 
ening or model reduction is often difficult [3HZ]- In fact, 
as we show here, the effectiveness of these emergent the- 
ories in physics also relies on the same parameter space 
hierarchy that is ubiquitous in multiparameter models. 

Recent studies of nonlinear, multiparameter models 
drawn from disparate areas in science have shown that 
predictions from these models largely depend only on 
a few 'stiff' combinations of parameters [HI |H1 E]. This 
recurring characteristic (termed 'sloppiness') appears to 
be an inherent property of these models and may be a 
manifestation of an underlying universality . Indeed, 
many of the practical and philosophical implications of 
sloppiness are identical to those of the renormalization 
group (RG) and continuum limit methods of statistical 
physics: models show weak dependance of macroscopic 
observables (defined at long length and time scales) on 
microscopic details. They thus have a smaller effective 
model dimensionality than their microscopic parameter 
space [12) . To clarify their connection to sloppiness, we 
apply an information theory based analysis to models 
where the continuum limit and the renormalization group 
already give a quantitative explanation for the emergence 
of effective models — a hopping model of diffusion and an 



Ising model of ferromagnetism and phase transitions. In 
both cases, our results show that at long time and length 
scales a similar hierarchy develops in the microscopic pa- 
rameter space, with sensitive, or 'stiff' directions corre- 
sponding to the relevant macroscopic parameters (such as 
the diffusion constant in the diffusion model). Moreover, 
as we show below, even where model reduction cannot be 
systemically generated, stiff combinations of parameters 
still do describe a universal effective model of a smaller 
dimension that captures most collective observables. 

We use information theory to track the development 
of this hierarchy in microscopic parameter space. The 
sensitivity of model predictions to changes in parameters 
is quantified by the Fisher Information Matrix (FIM). 
The FIM forms a metric that converts parameter space 
distance into a unique measure of distinguishability be- 
tween a model with parameters 9^ (for 1 < \i < N) and 
a nearby model with parameters 9^ + 69^ (see supple- 
mentary text and [51 1141 ITS] ) . This divergence is given 
by ds 2 = g^ v 59> l 59 u where g^ v is the FIM defined by: 
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where Pe(x) is the probability that a (stochastic) model 
with parameters 9^ would produce observables x. In the 
context of nonlinear least squares, g is the Hessian of chi- 
squared, the sum of squares of independent standard nor- 
mal residuals of data-fitting (supplementary text). Dis- 
tance in this metric space is a fundamental measure of 
distinguishability in stochastic systems. Sorted by eigen- 
values, eigenvectors of g describe a hierarchy of linear 
combinations of parameters that govern system behavior. 
Previously, it was shown that in nonlinear least squares 
models, the eigenvalues form a roughly geometrical se- 
quence, reaching extremely small values in many models 
(figure [l]). Thus, the eigenvalues of the FIM quantify 
a hierarchy in parameter space: few 'stiff' eigenvectors 
in each model point along directions where observables 
are sensitive to changes in parameters, while progres- 
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sively sloppier directions make little difference for observ- 
ables. These sloppy parameters cannot be inferred from 
data, and conversely, their exact values do not need to be 
known to quantitatively understand system behavior [S] . 
To see how this comes about, we turn to a 'microscopic' 
model of stochastic motion from which the diffusion equa- 
tion emerges. 
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FIG. 1: Eigenvalues of the Fisher Information Matrix (FIM) 
of various models are shown. Diffusive hopping model and 
the Ising model of ferromagnetism shown in first two columns 
are explored in this paper. Models of radioactive decay and a 
neural network are taken from a previous study [7]. The sys- 
tems biology model is a differential equation model of a MAP 
kinase cascade taken from [17]. In all models, we find that the 
eigenvalues of the FIM are roughly geometrically distributed 
forming a hierarchy in parameter space, with each successive 
direction significantly less important. Eigenvalues are normal- 
ized to unit stiffest value; only the first 10 decades are shown. 
This means that inferring the parameter combination whose 
eigenvalue is smallest shown would require ~ 10 10 times more 
data than the stiffest parameter combination. Conversely, 
this means that the least important parameter combination 
is VlO 10 times less important for understanding system be- 
havior. This is a much larger range in eigenvalues than that 
predicted by Wishart statistics (black line marked random), 
the naive expectation for least squares problems. 

The diffusion equation is the canonical example of a 
continuum limit. It governs behavior whenever small 
particles undergo stochastic motion. Given translation 
invariance in space and time, it subsumes complex micro- 
scopic collisions into an equation with only three terms 
which describe the time evolution of the particle density 
p: dtp(r, t) = DV 2 p — v ■ V p + Rp, where D is the diffu- 
sion constant v is the drift and R is the particle creation 
rate. Microscopic parameters describing the particles and 
their environment enter into this continuum description 
only through their effects on the terms in this equation. 
To see this, consider a microscopic model of stochastic 
motion on a discrete 1-dimensional lattice of sites, with 



2N + 1 parameters for — N < p < N which describe 
the probability that in a discrete time step a particle will 
hop from site j to site j + p (figure [2] inset). At initial 
time, all particles are at the origin, po(j) = <5j.o- The 
observables, x = pt{j), are the densities of particles at 
some later time t. 

After a single time step the distribution of particles 
is given by pi(J) = J . This distribution depends inde- 
pendently on all of its parameters, thus the FIM is the 
identity, g^ v = 8^ v (supplementary text). After a single 
time step, there is no parameter hierarchy — each param- 
eter is measured independently. When particles take sev- 
eral time steps before their positions are observed, some 
parameter combinations become easier to measure: fewer 
coarsened observations achieve the same accuracy. Other 
parameter combinations become harder to measure, re- 
quiring exponentially more observations (supplementary 
text). At late times, the particle creation rate, R, be- 
comes easier to measure as the mean particle number 
changes exponentially with time. The next eigenvalue, 
the drift, also becomes easier to measure as time passes. 
The diffusion constant itself becomes harder to measure 
as time passes, and further eigenvectors, describing the 
skew, kurtosis and higher moments of the final distribu- 
tion become harder and harder to measure as more time 
steps are made, each with a higher negative power of t 
(see figure [2] and supplementary text). This gives an in- 
formation theoretic explanation for the wide applicability 
of the diffusion equation. Any system with stochastic mo- 
tion and conservation of particle number will have a drift 
term dominate if it is present (for example, for a small 
particle falling through honey under gravity, in which we 
might neglect diffusion). If drift is constrained to be zero, 
by symmetry for example, then the diffusion constant 
will dominate in the continuum limit. Since the diffusion 
constant cannot be removed for stochastic systems, there 
is never a need for higher terms to enter into a contin- 
uum description. These results quantify a widely held 
intuition: one cannot infer microscopic parameters, such 
as the bond angle of a water molecule, from a diffusion 
measurement, and conversely it would also be unneces- 
sary to have such knowledge to quantitatively understand 
the coarse behavior of diffusing particles in water. 

Continuum models like the diffusion equation arise 
when fluctuations are only large on the micro scale. 
Their success can be said to rely on the largeness and 
slowness of observables when compared with the natu- 
ral scale of fluctuations. However, RG methods clarify 
that system behavior can be universal even when fluc- 
tuations are large on all scales, as occurs near critical 
points. The Ising model is the simplest model which ex- 
hibits nontrivial thermodynamic critical behavior. Near 
its critical point, the Ising model predicts fractal do- 
mains whose statistics are universal, quantitatively de- 
scribing the spatial structure of magnetic fluctuations 
in ferromagnets, the density fluctuations near a liquid- 
gas transition and the composition fluctuations near a 
liquid- liquid miscibility transition |18l I19j . Consider a 
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FIG. 2: We consider a hopping model on a 1-D lattice, with 
seven parameters describing the probability that a particle 
will remain at its current step or move to one of its six near- 
est neighbors in a discrete time step. We calculate the FIM 
for this model, for observations taken after a given number of 
time steps, for the case where all parameters take the value 
= 1/7. Top row shows the resulting densities plotted at 
times t = 1,3, 5, 7. Bottom plot shows the eigenvalues of the 
FIM versus number of steps. After a single time step, the 
FIM is the identity, but as time progresses, the spectrum of 
the FIM develops a hierarchy spanning many orders of magni- 
tude. The second eigenvector measures a net rate of particle 
creation, R. The next eigenvector measures a net drift in the 
density, v. The third eigenvector corresponds to parameter 
combinations that change the diffusion constant, D. Each 
of the above will dominate a continuum description if those 
above it are constrained to be (or are otherwise small). 
Further eigenvectors describe parameter combinations that 
do not affect these macroscopic parameters, but instead mea- 
sure the kurtosis, skew, and higher moments of the resulting 
density. 



two dimensional square lattice Ising model where at ev- 



ery site a 'spin' takes a value of 



±1. Observables 



are spin configurations [x = or subsets of spin 

configurations (x n , as defined below). The Ising model 
assigns to each spin configuration a probability given by 



its Boltzmann weight, P$(x) 



/Z. The model is 



parametrized through it's Hamiltonian %g{x) — O^Q^x) 
where 9^ are parameters describing a field 8° which 
multiplies $o(x) = J2ij s i,ji 01 '■> a coupling between 
spins and one of their nearby neighbors, a ^ , multiply- 
ing $ a p{x) = J2i, 3 s i,j s i+a,j+p (see inset of figure [3] and 
supplementary text). 

At the microscopic level, all spins are observable and 



the Ising FIM is a sum of 2 and 4-spin correlation func- 
tions that can be readily calculated using Monte- Carlo 
techniques ( [9] and supplementary text). Near the criti- 
cal point, it has two 'relevant' eigenvectors with eigenval- 
ues that diverge like the specific heat and magnetic sus- 
ceptibility [lOl H2] • These two large eigenvalues have no 
analog in the diffusion equation, and reflect the presence 
of fluctuations at scales much larger than the microscopic 
scale (here this scale is the lattice constant: the distance 
between neighboring sites). The remaining eigenvalues 
all take a characteristic scale given by the system size, in 
units of the lattice constant (supplementary text). The 
clustering of the remaining eigenvalues is reminiscent of 
the spectrum seen in the diffusion equation when viewed 
at its microscopic (time) scale. When observables are mi- 
croscopic spin configurations, the nearest neighbor Ising 
model is a poor description of a binary liquid, and even 
of a ferromagnet. 

To coarsen the Ising model, the observables are re- 
stricted to a subset of lattice sites chosen via checker- 
board decimation procedure (figure [3] top row inset fig- 
ures). The FIM of equation ||| is now measured using as 
our observables only those sites in a sub-lattice decimated 



by a factor 2 T 



\ s i,jS{i 



For example, after 



1 level of decimation, this corresponds to the black sites 
on the checkerboard, while after 2 steps, only sites {i, j} 
where i and j are even remain. Importantly, the distri- 
bution is still drawn from the ensemble defined by the 
original Hamiltonian defined on the full lattice. The cal- 
culation is implemented using compatible Monte-Carlo 
( [T] and supplementary text). 

The results from Monte-Carlo are presented for a 
64 x 64 system at its critical point in figure [3] The irrel- 
evant and marginal eigenvalues of the metric continue to 
behave much as the eigenvalues of the metric in the dif- 
fusion equation, becoming progressively less important 
under coarsening with characteristic eigenvalues. How- 
ever, the large eigenvalues, dominated by singular cor- 
rections, do not become smaller under coarsening; they 
are measured by their collective effects on the large scale 
behavior, which is primarily informed by large distance 
correlations. In the supplementary text, we use RG anal- 
ysis to explain the scaling of the FIM eigenvalues with 
the coarse-graining level. The analysis clarifies that 'rel- 
evant' directions in the RG are exactly those whose FIM 
eigenvalues do not contract on coarsening. They con- 
trol the large-wavelength fluctuations of the model, and 
they dominate the behavior provided that the correlation 
length of fluctuations is larger than the observation scale. 

We have seen that neither the hopping model nor the 
Ising model are sloppy at their microscopic scales. It is 
only upon coarsening the observables, either by allowing 
several time steps to pass, or by only observing a subset of 
lattice sites, that a typical sloppy spectrum of parameter 
combinations emerges. Correspondingly, multiparame- 
ter models such as in systems biology and other areas 
of science are sloppy only when fit to experiments that 
probe collective behavior — if experiments are designed 
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to measure one parameter at a time, no hierarchy can be 
expected 23, 24]. In the models examined here, there is 
a clear distinction between the short time or length scale 
of the microscopic theory, and the long time or length 
scale of observables. As we show more formally in the 
supplementary text, sloppiness can be precisely traced 
to the ratio of these two scales — an important small 
variable. On the other hand, in many other areas of sci- 
ence such a distinction of scales cannot always be made. 
As such, those models cannot be coarsened or reduced 
in the same systematic way using methods readily ap- 
plicable to physics theories (see also [7]). Nonetheless, 
owing to their sloppy FIMs, these models share many of 
the striking implications of the continuum limit and RG 
methods. 

We thank Seppe Kuehn and Stefanos Papanikolaou for 
useful comments and discussions. This work was sup- 
ported by NSF grant DMR 1005479 and a Lewis-Sigler 
Fellowhip (BBM). 
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FIG. 3: We consider an Ising model of ferromagnetism as 
defined in the text, with 13 parameters describing nearest 
and nearby neighbor couplings (shown in the bottom inset), 
and magnetic field. Observables are spin configurations of all 
spins on a sub-lattice (dark sites in the insets of the top panel) . 
Top panel shows one particular spin configuration generated 
by our model, suitably blurred for level > to the average 
spin conditioned on the observed sub-lattice values. As can 
be seen by eye, some information about the configuration is 
preserved by this procedure (the typical size of fluctuations, 
for example), while other information, like the nearest neigh- 
bor correlation function, is lost. We quantify this by measur- 
ing the eigenvalues of the FIM of this model as a function of 
coarse-graining level. As this coarsening step only discards in- 
formation, all of the eigenvalues must be non-increasing with 
level. The two largest eigenvalues, whose eigenvectors mea- 
sure T — T c and the applied field h do not shrink substantially 
under coarsening (supplementary text). Further eigenvalues 



shrink by a factor of 2 
RG eigenvalue. 
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in each step, where yi is the i 
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Supplementary Information where we note that this gives an alternative definition of 

the familiar entropy Si of Pi (in nats). We can also ask 
how likely P 2 is to produce a typical ensemble generated 

I. INTRODUCTION h Y P l- This is J ust S iven b y : 



This supplement contains relevant background and 
computational details to accompany the main text. In 
section [IT] we provide a pedagogical overview of the in- 
formation theoretic tools that we use to quantify distin- 
guishability. In section III we apply this formalism to 



a model of stochastic motion that is described in the 
main text and provide details of the calculation that un- 
derlies figure 2 of the main text. We also provide an 
asymptotic analysis of the scaling of the FIM's eigen- 
values in the limit where coarsening has proceeded for 
many time steps. In sections |TV} | VII| we discuss the Ising 
model. In section [IV] we carefully define our 13 parame- 
ter Ising model as briefly described in the main text. In 
section [V] we give an outline of our numerical techniques 
for measuring the FIM, as well as give a scaling argu- 
ment that explains its spectrum before coarsening. In 
section [VI] we e xtend this analysis to the coarsened case. 
In section |VII| we give details of our Monte-Carlo tech- 
niques, with emphasis on our implementation of 'Com- 
patible Monte-Carlo' pQ. 



II. INFORMATION GEOMETRY AND THE 
FISHER METRIC 

How different are two probability distributions, P\{x) 
and P 2 (a;)? What is the correct measure of distance 
between them? In this section we give an overview of 
an information theoretic approach to this question [21- 
4 . Imagine being given a sequence of independent data 
points {x\ 1 X2, ...xn}, with the task of inferring which of 
the two models would be more likely to have generated 
the data. As probabilities multiply, the probability that 
Pi would have generated the data is given by: 



l[P 1 (x i )=exp j^logPx^) 



(2) 



and by calculating this for each of the two distributions 
P\{x) and P 2 (x), we could see which model would be 
more likely to have produced the observed data. 

How difficult should one expect this task to be? Pre- 
suming N to be large we can estimate the probability 
that a typical string generated by Pi would be produced 
by Pi. To do this we simply take a product similar to 
that in equation [2] but with each state x entering into the 
product NPi(x) times: 



nPiW'' 1 ' 1 ' =cxp (iVy^P^^logP^x) 

X \ X 



= exp(-JVSi) 



(3) 



]JP 2 (x) NP ^ =exp (iV^Px^logPa^) j (4) 

X \ X / 

We can ask how much more likely a typical ensemble 
from Pi is to have come from Pi rather than from P 2 . 
This is given by: 



U(Px(x)/P 2 (x)) NP ^ = exp [NJ2Pi(x) log (ft 



2(x) 



= exp(-ND KL (P 1 \\P 2 )) 

(5) 

This defines the Kullback-Liebler Divergence, the sta- 
tistical measure of how distinguishable Pi is from P 2 from 
its data x [H [5] : 



D KL {P 1 \\P 2 ) = Y,Pi{x)\og 



P2(x) 



(6) 



This measure has several properties that prevent it 
from being a proper mathematical distance measure, 
most obviously that it does not necessarily satisfy 
Dkl{Px\\P2) = D KL {P 2 \\P 1 ) 1 . However, for two 'close- 
by' models Dkl does become symmetric. Consider a 
continuously parameterized set of models Pg where 9 is 
a set of N parameters 0^. The infinitesimal Kullback- 
Liebler divergence between models Pg and Pg+Ae takes 
the form 2 : 

D KL (Pg,P g+A g) = g flu A0"A0 v + OA9 3 (7) 

where <7 M „ is the Fisher Information Matrix (FIM), given 
by: 

g^Pe) = ~'£P 9 (x)^LJLlDgP e (x) (8) 

X 

The quadratic form of the KL-divergence at short dis- 
tances motivates using the FIM as a metric on param- 
eter space. This defines a Riemannian manifold 3 where 
each point on the manifold specifies a probability distri- 
bution 3 . The tensor can be shown to have all of 



1 A distance measure should also satisfy some sort of general- 
ized triangle inequality- at the very least D(A, B) + D(B, C) > 
D(A, C) which is also not necessarily satisfied here. 

2 It is an interesting exercise to show that there is no term linear in 
A9. The crucial step uses that Pg is a probability distributions 
so that d f ,T. x Pe(x) = 0. 

3 Although typical models contain internal singularities, where the 
metric has eigenvalues that are (see [IDE])- 
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the necessary requirements to be a metric- it is symmet- 
ric (derivatives commute) and positive semi-definite (in- 
tuitively because no model can fit any model better than 
that model fits itself). It also has the correct transforma- 
tion laws under a reparameterization of the parameters 
9. Distance on this manifold is (at least locally) a mea- 
sure of how distinguishable two models are from their 
data, in dimensionless units of standard deviations. This 
already gives one important difference between informa- 
tion geometry and the more familiar use of Riemannian 
geometry in General Relativity. In General Relativity 
distances are dimensionful, measured in meters. While 
certain functions of the manifold (notably the Scalar cur- 
vature) are dimensionless and can appear in interesting 
ways on their own, a distance is only large or small when 
compared to some other distance. In information geom- 
etry, by contrast, distances have an intrinsic meaning- 
Probability distributions are distinguishable from a typ- 
ical measurement provided the distance between them is 
greater than one. Below we consider the metric for two 
special cases. 



A. The metric of a Gaussian model 

First, motivated by non-linear least squares we con- 
sider a model whose output is a vector of data, yt (for 
1 < i < M). Underlying least squares is the assumption 
that observed data is normally distributed with width 
a 1 around a parameter dependent value, yo(6). As such, 
the 'cost' or sum of squared residuals is proportional to 
the log of the probability of the model having produced 
the data. We write the probability distribution of data y 
given a set of parameters 9 as: 



Pe(y) ~ exp (- J2(y l - vW)f/toA 



(9) 



Defining the Jacobian between parameters and scaled 
data as: 



J- in 



d9» a 1 



(10) 



The Fisher information for least squares problems is sim- 
ply given by [BJ [7] : 



(ii) 



the mapping yo(9) (it is induced by the Euclidian metric 
in data space). It is exactly this metric that was shown to 
be sloppy in seventeen models from the system's biology 
literature [SHE]- 



B. The metric of a Stat-Mech Model 

Second, we consider the case of an exponential model, 
familiar from statistical mechanics, defined by a param- 
eter dependent Hamiltonian that assigns an energy to 
every possible configuration, x. (We set the temperature 
as well as Boltzmann's constant to 1) Each parameter 
9^ controls the relative weighting of some function of the 
configuration, $„(a;), which together define the probabil- 
ity distribution on configurations through: 



P(x\9) 
Z{9) 

Hg(x) 



exp(- 
exp(- 



H 9 (x))/Z 
F(0)) = £exp( 



-H e (x)) 



(X) 



(12) 



Though perhaps unfamiliar, typical models can be put 
into this form. For example, the 2D Ising model of sec- 
IV has spins Sij = ±1 on a square LxL lattice 



tion 



with the configuration, x = {stj}, being the state of all 
spins. The magnetic field, 9° = h multiplies <I > o({si j}) — 

E 



and the nearest neighbor coupling, 9 — 



J 

^ % j s i — j at >3^-\-i,j ^^j^ji+i* This 
form is chosen for convenience in calculating the metric, 
which is written l9l 1101 5 : 



multiplies ( f > i({s lJ }) = J2a s i,j s i 



9nv = (-<9/A log(P(x))) 

= {d li d„H{x)) + d tl d v \og{z) 
■ log( 2 ) = -dfj,d v F 



d,j,d u 



(13) 



To write the last line we have taken advantage of the fact 
that the Hamiltonian is linear in parameters # M so that 
(d tl d 1/ H(x)) — 0. As such, the last line does not trans- 
form like a metric under an arbitrary reparameterization, 
but only one that preserves the form given in equation 



III. A CONTINUUM LIMIT: DIFFUSION 

With these definitions in hand, we turn to a specific 
problem where information about microscopic details is 



This particular metric has a geometric interpretation: 
distance is locally the same as that measured by embed- 
ding the model in the space of scaled data according to 



4 This assumes that the uncertainty a 1 does not depend on the 
parameters, and that errors are diagonal. Both of these assump- 
tions seem reasonable for a wide class of models, for example 
if measurement error dominates. The more general case is still 
tractable, but less transparent. 



5 Several seemingly reasonable metrics can be defined for systems 
in statistical mechanics and all give similar results in most cir- 
cumstances I10| . Most differences occur either for systems not 
in a true thermodynamic (TV large) limit, or for systems near a 
critical point. As far as we are aware, Crooks [9] was the first 
to stress that the one used here can be derived from information 
theoretic principles, perhaps making it the most 'natural' choice. 
In [9] Crooks showed that when using this metric 'length' has 
an interesting connection to dissipation by way of the Jarzynski 
equality 



lost in a coarse-grained description. A prototypical ex- 
ample of such a continuum limit is the emergence of the 
diffusion equation in a system consisting of small parti- 
cles undergoing stochastic motion. Diffusion effectively 
describes the motion of a particle provided that there is 
translation invariance in time and space and that par- 
ticle number is conserved. Microscopic parameters that 
describe details of the medium in which the particle is 
diffusing and the molecular details of such an object en- 
ter into this continuum description only through their 
effects on the diffusion constant, or, if it is present, the 
rate of drift. Furthermore, knowing molecular details 
(for example the bond angle of a water molecule in the 
medium through which a particle is diffusing) that might 
enter into a microscopic description of the motion would 
be extremely unhelpful in predicting a particle's diffusion 
constant. 

To see how this comes about we consider a 'micro- 
scopic' model of stochastic motion on a discrete lattice 
of sites j. Our model is defined by 2N + 1 parameters 9^, 
for — N < p < N which describe the probability that in 
a discrete time step a particle will hop from site j to site 
j + [j*. We presume that we start our particles from a dis- 
tribution po(j) 7 and that our measurement data consists 
of the number of particles at some later time t, pt{j)- 

We first consider taking 'microscopic' measurements of 
our model parameters, by starting with an initial prob- 
ability distribution po(j) = ^j,0j an d observing the dis- 
tribution after one time step, pi(j). This distribution is 
just given by: 



(14) 



Presuming our measurement uncertainty of the num- 
ber of particles at each site is Gaussian, with width 6 
<T meas = 1. we can calculate the Fisher metric on the 
parameter space using the Least Squares metric defined 



in equations 10 and 11 



9nv — zZi 



(15) 



origin. We next examine the behavior of the FIM for data 
that is in the form of densities measured after multiple 
time steps. 



A. Coarsening the diffusion equation by observing 
at long times 

The molecular timescale is typically much faster than 
the typical timescale of a measurement. We ask how our 
ability to measure microscopic parameters changes with 
experiment time. 

To calculate the density of particles at position j and 
time t, pt{j), it is useful to introduce the Fourier trans- 
form of the hopping rates, as well as the Fourier trans- 
form of the particle density at time t: 



N 

Qk _ e ~ik^Qfi 

oo 

~k = £ e -ikj pt{j) 

j=—oo 



(16) 



Ptti) = h I dke * k3 pt 



In a time step the density distribution is convoluted by 
the hopping rates. In Fourier space this is simply written 



p'i 



~ t ~ p k_ x 



(17) 



We choose initial conditions with all particles at the ori- 
gin p (j) = 5 j>0 , so that: 



p\ 



k\i 



fe) 



(18) 



Pt(j) = £ J dke^(9 k Y 

— TV 

The Jacobian and metric at time t can now be written 



4 =9nPtV) = &J dfce**Ci-">(0*)*-i 

— 7T 



9uv 



(19) 



This metric has 2N+1 eigenvalues each with value A = 1. 
All of the parameters in this model are measurable with 
equal accuracy. Additionally, if we wanted to understand 
the behavior at this microscopic level, there is no reason 
to think that a reduced description of the model should 
be possible; each direction in parameter space is equally 
important in determining the one step evolution from the 



6 We could carry out a more complicated calculation assuming 
our uncertainty comes from the stochastic nature of the model 
itself, but presuming we start with many particles, this approach 
would yield similar but less transparent results. Changing the 
measurement uncertainty from 1 to <r mea3 will multiply all cal- 
culated metrics by a trivial factor of l/ff^, eas and is omitted for 
clarity. 



The metric now depends on the 9 themselves. Presuming 
the (positive) hopping rates 9^ values sum to 1 with at 
least two non-zero, then all of the 9 values are less than 
one and the late time behavior of g i* is dominated by 
small k values appearing in the integrand (equation 19 1. 
At small values of k: 



9 k = 1 - ikv - V A + °( k3 ) 
= cxp(-ikv - D^) + 0(k 3 ) 

D =A-w 2 



(20) 



This is due to the convolution theorem. See, for example 1 1 21 
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where in going from the first line to the second we note 
these two equations are the same to second order in k. 
Here v is the drift and D is the diffusion constant. From 
this approximation we can estimate the form of <?* for 
late times. For the case where the drift v = 0: 



(Dty 



(21) 



We can expand this in powers of the small parameter 
O - vf/Dt. This gives 

gl„ ~ t 2 ((^)- 1/2 - (Dtr 3/ H» ^) 2 /4 + ■ ■ •) 

_ , 2 f (-D"(m-^) 2 " (22) 

Each term in the series contributes a single new non-zero 
eigenvalue which scales like: 



-V, 



D . -n-1/2 



(23) 



The corresponding eigenvectors are best understood 
by considering their projection onto the observables. 
These are proportional to the left singular vectors of J, 
v L.n — (lAn) Jifj. v n- These are exactly the Hermite poly- 
nomials of a gaussian with width 2a = \[l)t. The first 
one measures non-conservation of particle number, R, the 
second measures drift, u, and the third measures changes 
in the diffusion coefficient, D. The next terms are less 
familiar; those past n — 2 never appear in a continuum 
description, because they are always harder to observe 
than the diffusion constant by a factor of the ratio of 
the observation scale {sfDt) to the microscopic scale (N) 
raised to a positive integer power. It is not possible for 
the diffusion constant, as defined here, to be while any 
higher cumulants are non-zero, explaining why though 
drift and the diffusion constant both appear in contin- 
uum limits, the physical parameter that describes the 
third cumulant does not. The next eigendirection mea- 
sures the Skew of the resulting density distribution, while 
the next one measures the distribution's Kurtosis, and so 
on. It is worth noting that careful observation of a partic- 
ular 9^ , somewhat analogous to knowing the bond- angle 
of a water molecule, would give very little insight on the 
relevant observables. The exact eigenvalues, measured at 
steps 4=1 — 7 are plotted in figure 2 of the main text 
for an N = 3 (seven parameter) model where 9^ = 1/7 
for all /.i. 



IV. A CRITICAL POINT: THE ISING MODEL 



large up to a characteristic scale £ which diverges at the 
critical point itself. Perhaps surprisingly, even at these 
points these systems have behavior that is universal. The 
Ising model, for example, provides a quantitative descrip- 
tion of both Ferromagnetic and liquid-gas critical points, 
describing all of the statistics of the observable fluctu- 
ations of both systems, even though they have entirely 
different microscopic components. Just as in diffusion, 
the observed behavior at these points can then be de- 
scribed by just a few 'relevant' parameters (two in the 
Ising model; the bond strength and the magnetic field). 

The Ising model discussed here takes place on a square 
lattice (with lattice sites 1 < i,j < L ), with degrees of 
freedom Si.j taking the values of ±1. The probability of 
observing a particular configuration on the whole lattice 
(denoted by {sij}) is defined by a Hamiltonian (H {sij}) 
that assigns each configuration of spins an energy (see 
equation 



12) 



The usual nearest neighbor Ising Model has two pa- 
rameters: a coupling strength (J), and a magnetic field 
(h) through the equation: 



H({ s i,j}) — J y ] s ij s ij+l + 8 ij s i+lj 



i,3 



(24) 



Here we consider a larger dimensional space of possible 
models, by including in our Hamiltonian the magnetic 
field (9 h ), the usual nearest neighbor coupling term, and 
12 'nearby' couplings parameterized by 9 a ^ . We addi- 
tionally allow the vertical and horizontal couplings to be 
different. In the form of equation |12| 



h(x) = E o a ^ aP (K;» + e h * h (K,}) 

®af3 ({Si,j}) = J2 s ij s i+aj+/3 
i,j 

®h {{ s i,j}) = J2 s ij 



(25) 



We calculate the metric along the line through parameter 
space that describes the usual Ising model (where 9 01 = 
8 10 = J and 9 a P — otherwise) in zero magnetic field 
(9 h = 0). 



MEASURING THE ISING METRIC 



Using equation 13 we can rewrite the metric in terms 
of expectation values of observables (where except when 
necessary we condense the indexes a/3 and h into a single 



The success of the continuum limit might be said to 
rest on the 'boringness' of the large-scale behavior. All 
of the fluctuations in the system are essentially averaged 
at the scale of typical observations. This fails to be true 
near critical points of systems, where fluctuations remain 



fj^ = d^d v \ogz 



($„) ($„ 



(26) 



Furthermore, given a configuration x — {sjj} we can 
readily calculate ^>^(x), which is just a particular two 
point correlation function (or the total sum of spins for 
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To estimate the distribution denned in equation 26 we 
used the Wolff algorithm [13] to very efficiently generate 
an ensemble of configurations x p = {si.j} p , for 1 < p < 
M for systems with L — 64. We also exactly enumerated 
all possible states on lattices up to L = 4 to compare 
with our Monte-Carlo results (not shown). 

With our ensemble of M lattice configurations, Xi, we 
thus measure: 



9nu 



M 2 — M 



(27) 




1.0 



1.5 



2.0 



FIG. 4: The eigenvalues of the metric for the enlarged 13 
parameter Ising model described in the text is plotted along 
the line defined by the usual Ising model with f3J as the only 
parameter, and h = 0. Two parameter combinations become 
large near the critical point, each diverging with character- 
istic exponents describing the divergence of the susceptibil- 
ity and specific heat respectively. The other eigenvalues vary 
smoothly as the critical point is crossed, and furthermore they 
have a characteristic scale and are neither evenly spaced nor 
widely distributed in log. 



with the scaling of the susceptibility (x ~ £ 7 ^ 4 , whose 
eigenvector is simply 9 h ) and specific heat (C ~ log(£), 
whose eigenvector is a combination of 9 a ^ proportional to 
the gradient of the critical temperature, Jpft- ), respec- 
tively. From an information theoretic point of view, these 
two parameter combinations seem to become particularly 
easy to measure near the critical point because the sys- 
tem's behavior becomes extremely sensitive to them. The 
behavior of these two eigenvalues seems to have no par- 
allel in the diffusion equation viewed at its microscopic 
scale. 



A. Scaling analysis of the Eigenvalue spectrum 

To understand our Monte Carlo results for the eigen- 
values of the metric, we apply a more standard renor- 
malization group analysis to our calculation. To do this 
it is useful to use the form g„ v = —d^d v F (see equa- 
tion 



13 1, and in particular we focus on the critical re- 



gion, close to the renormalization group fixed point 9q. 
After a renormalization group transformation that re- 
duces lengths by a factor of b the remaining degrees of 
freedom are described by an effective theory with param- 
eters 9' related to the original ones by the relationship 
0/,x _ 0£ = _ where rp has left eigenvectors 

and eigenvalues given by and b Vc " . It is convenient to 
switch to the so-called scaling variables, u a = e a,/j^' 
which have the property that under a renormalization 
group transformation 



b y a 



(28) 



It is also convenient to divide our free energy into a sin- 
gular piece and an analytic piece, so that: 



The results are plotted in figure [4] Away from the crit- 
ical point in the high temperature phase (small /3J) the 
results seem somewhat analogous to those we found for 
the diffusion equation viewed at its microscopic scale. All 
of the parameters that control two spin couplings (Q a P) 
are roughly as distinguishable as each other, with 9 h hav- 
ing different units. However, as the critical point is ap- 
proached, the system becomes extremely sensitive both 
to 9 h and to a certain combination of the 9 al3 parameters. 
This divergence has been previously shown for the con- 
tinuum Ising universality class [10]. In fact, as we will see 
in the next section, these two metric eigenvalues diverge 



$7, {{sij}) = j s i,j ls very simple and efficient to calculate 
for a given configuration {sij}. ({sij}) is only slightly 

harder. One defines the translated lattice j(a, 0) = s i+a j+p, 
in terms of which we write <J> a/3 ({sij}) = s i,j s 'i,j( a < P)- 



F{9)= Af s {u a (9)) + Af a {u a {9)) 

f s = ui /2yi U(r ,...,r a ) (29) 
T a = u a ju^ 

where fs are free energy densities, A is the system size 
and where f a and U are both analytic functions of their 
arguments. Notice that by construction the rs do not 
transform under an RG transformation. The Fisher In- 
formation can be similarly divided into two parts, yield- 



9 8'^ — 9^ = Ty(8 v — 8q) is strictly true only if the parameters 
span the space of possible Ising Hamiltonians, but our analysis 
holds for g^y on the space of the original parameters provided 
the 8' span all possible models, which we can assume in this 
analysis. Said differently, there is no need for T to be square, 
and it is sufficient for the analysis presented above to assume 
that T is 13 by infinite dimensional. 
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mg: 



9^ 
9% 

9% 



A ( dug due \ (y a +Vf)-d)/yi Q g 

A Luot,p\ d9v d0" > U 1 dr"~S^ LA 



(30) 



du a dup Q 
d8n OB" du a 



w~f a 

O U r s ■' 



where £ 
like 

y 



the 



fdti Q dup 



correlation 
Both 



which 



length, 



diverges 
and 



)M' a 



are tensors in parameter 



space with two lower indices that are expected to 
vary smoothly as their argument is changed, with no 
divergent or singular behavior, and eigenvalues that all 
take a characteristic scale. As such, we expect that as 
the critical point is approached the matrices eigenvalues 
will scale like: 



A? 



A? - A 



(31) 



As the critical point is approached we expect the sin- 
gular piece to dominate provided 2yi — d > . In the 
2D Ising model, this is true for the magnetic field, which 
as the critical point is approached becomes the largest 
eigenvector eo = h (with y^ = 15/8) and for the eigen- 
vector given by e\ = c^ui whose eigenvalue is y\ = 1 
(in this case 2y t — d = and there is a logarithmic di- 
vergence, as with the Ising model's specific heat). The 
remaining eigenvectors of g^u are dominated by analytic 
contributions. These analytic contributions, just as in 
the diffusion equation viewed at its fundamental scale, 
cause the corresponding eigenvalues to cluster together at 
a characteristic scale and not exhibit sloppiness (though 
not necessarily to be exactly the identity). This analysis 
agrees with the Monte Carlo results plotted in figure |4j 



VI. MEASURING THE ISING METRIC AFTER 
COARSENING 



model behave differently under coarsening from the ir- 
relevant ones? To answer these questions we ask how 
well we could infer microscopic parameters of the model 
from data that is coarsened in space 10 . In particular, 
we restrict our measurements to observations of spins 
that remain after an iterative checkerboard decimation 
procedure 11 . In the usual RG picture a new effective 
Hamiltonian is constructed that describes the observable 
behavior at these lattice sites. Here we instead calculate 
the Fisher Information Matrix in the original parame- 
ters, but only using information remaining at the new, 
coarsened level. 



= -(d^d v log(P(x n ))) 
The levels are defined 



Specifically, we measure g^ v 
where x n = js, A c <■ -i • , , 

l c 'jJ for \i,J} m level n 

as follows: If n is even then {i,j} is in level n iff i/2 n / 2 
and j/2™/ 2 are both integers. If n is odd than {i,j} 
is in level n if and only if {i,j} is in level n — 1 and 
(i + j)/2"/ 2+1 is an integer. The first level is thus a 
checkerboard, the second has only even sites, the third 
has a checkerboard of even sites, etc. We define the map- 
ping to level n, determined by the configuration of all 
spins x at level 0, as x n = C n (x) 12 . It is useful to write 
P(x n ) in terms of a restricted partition function : 



P{x n ) 
Z(x n ) 



Z(x n )/Z 

Y, exp{-H(x))S(C n (x) = x n ) ( 32 ) 



where Z(x n ) is the coarse-grained partition function con- 
ditioned on the sub-lattice at level n taking the value x n 
while summing over the remaining degrees of freedom. 
We also introduce notation for an expectation value of 
an operator defined at level over configurations which 
coarsen to the same configuration x n 



The diffusion equation became sloppy only after coars- 
ening. Viewed at its microscopic scale all parameters 
could be inferred with exactly the same precision. How- 
ever, when observed at a time or length scale much larger 
than this microscopic scale a hierarchy of importance de- 
veloped, with particle non- conservation being most vis- 
ible, drift being the next most dominant term and the 
diffusion constant being the next most observable pa- 
rameter. Further parameters became geometrically less 
important, justifying the use of an effective continuum 
model containing just the first of these parameters with 
a non-zero value. 

What happens in the Ising model? Does a similar hier- 
archy develop? Do the 'relevant' parameters in the Ising 



J2 Q{x)S(C n {x) = x 11 ) exp(-H(x)) 

{Q} x n = ~— (33) 

x Z(x n ) 



10 there is no sense of 'time' in the Ising model, since it does not 
specify dynamics. 

11 We use this checkerboard decimation scheme rather than a block 
spin scheme (say) as it is easier to implement the Compatible 
Monte-Carlo described below. 

12 The mapping C n (x) here simply discards all of the spins that do 
not remain at level N, leaving an square lattice 
for even N and a rotated 'diamond' lattice for odd N. However, 
this formalism would also apply to other schemes, such as the 
commonly used block-spin procedure. 
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We can now rewrite the metric at level n as: 
9% =-d^(\og(P(x n ))) 

= d^\o g (z) - (d^\o g (z(c n (x)))) 

= 9n» C n [x) ) (34) 
+ <{*m>o»(»){*»'}o»(x)> 

- <{*m}o-(,))<{*v W)> 

This quantity / j^^a;) ( ){^^} c ( ^ can be 

measured by taking each member of an ensmble, x q , and 
generating a sub-ensemble of x' q r according to the distri- 
bution defined by: 



coarse-graining for n steps each observation yields the 



P{x qr \xq) 



J2exp(-H(x))S(C n (x' qtr ) = C n (x q )) 

X 

z{CHx q ))) 



(35) 

Techniques for generating this ensemble, using a form of 



'compatible Monte-Carlo' [T] are discussed in section VII 



From an ensemble of M configurations x q taken from the 
ensemble of full lattice configurations, and x q<r members 
of the ensemble given by P(x' q r \x q ) for each x q we can 
calculate: 



9. 



\1V 

q=M r,s=M' ( 

~ (M)(M' 2 -M') E I ®»( X q,r)®v ( X q,s) 

q,r.s — lr^s \ 

M \ 
~W=l E */i«,r)*"«.) 

p=i p¥=q ) 

(36) 

The results of this Monte Carlo presented for a 64 x 64 
system at its critical point in figure 3 of the main text. 
The irrelevant and marginal eigenvalues of the metric 
continue to behave much as the eigenvalues of the met- 
ric in the diffusion equation, becoming progressively less 
important under coarsening with characteristic eigenval- 
ues. However, the large eigenvalues, dominated by singu- 
lar corrections, do not become smaller under coarsening, 
presumably because they are measured by their collec- 
tive effects on the large scale behavior, which is primarily 
measured from large distance correlations. 



A. Eigenvalue spectrum after coarse-graining 

To understand the values of the metric we observe af- 
ter coarsening, we apply a more standard RG-like anal- 
ysis to our system. We do this by constructing an ef- 
fective Hamiltonian in a new parameter basis, repeating 
our analysis for the metric's eigenvalues in the coordi- 
nates of the parameters of that Hamiltonian, and finally 
transforming back into our original coordinates. After 



data x n = {sij} 



in level n 

remaining at level n are observed 
observing x n can be written: 



where only the spins {i,j} 
The probability of 



P(x n ) 



exp (-H n (x n )) 
Z(A n ,u n ) 



(37) 



where H n is the effective Hamiltonian after n coarse- 
graining steps. H n has new parameters most conve- 
niently written in terms of the scaling variables defined 
in equation |28| where we can write u™ = b Vc " n u a . In addi- 
tion, the area of the system is reduced to 13 A n = b~ dn A 
and dul/de" = b v «du a /d9i*. 

After rescaling the entropy of the model is smaller by 
an amount AS" 1 from the original model's entropy. It is 
customary in RG analysis to subtract this constant from 
the Hamiltonian, so as to preserve the free energy of the 
system after rescaling: 



p7i _ pn,s _|_ pn 



AS V 



F s + F a =F 



(38) 



Note that the new model's Hamiltonian would still be 
linear in these new parameters, allowing us to use the 
algebra of equation |13| if we were to remove the constant 
AS from the new Hamiltonian. This would of course 
be an identical model, since the addition of a constant to 
the energy does not change any observables. This change 
allows us to express the metric for the new observables 
in terms of the original parameters, taking 



= d,MF n ' s + F n n = d^(F s + F a ~AS) (39) 
After some algebra we see that: 



s,n 

J [IIS 



= d^d v F n < s = d^d v F s = 3 % 

= d^F 71 '" = b- dn Ad fl d v f n ' a 
= b- dn A d S d i£M^{u") 



(40) 



The singular piece is exactly maintained as the sin- 
gular part of the free energy is preserved after an RG 
step. This means that the singular piece of the free 
energy is exactly the piece which describes information 
carried in long wave-length information. On the other 
hand, the analytic piece is smaller by d^d^AS 11 . The 
matrix (j^ft^-)M.'^f j (u n ) should be smoothly varying, 
as u n varies a small amount with n. Importantly, all of 
its eigenvalues should continue to take a characteristic 
value. Thus, after n rescalings: 



13 we keep our rescaling factor b general here, but in our system 
b = y/2 
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\ J 1 . 3 

\ 

\ n.a 



(41) 



To ensure that the Fisher information is strictly de- 
creasing in every direction on coarsening 1 g® must be 
negative semidefmite in the subspace of scaling variables 
where 2yi — d > 0. For these relevant directions, with 
i = 0, 1 Xf ~ A£ 2 y*- d - Ab 2 y*- d n, where the second term 
only becomes significant when b n ~ £ (when the lattice 
spacing is comparable to the correlation length). For ir- 
relevant directions, or relevant ones with < 2yi < d 
(corresponding to i > 2 in the Ising model), the analytic 
piece will dominate as the critical point is approached, 
yielding ~ Ab 2yi ~ d . These results are in quantita- 
tive agreement with those plotted in figure 3 of the main 
text assuming that our variables project onto irrelevant 
and marginal scaling variables with leading dimensions of 
y = (blue line in figure 3 of main text), y — — 2 (green 
line in figure 3 of the main text) and y = —4 (purple line 
in figure 3 of the main text) consistent with the theo- 
retical predictions for the irrelevant eigenvalue spectrum 
made in [T4"] . 



VII. SIMULATION DETAILS 

To generate ensembles x p that are used to calculate 
the metric before coarsening we use the standard Wolff 
algorithm |13j . implemented on 64x64 periodic square 
lattices. We generate M = 10, 000 - 100, 000 indepen- 
dent members from each ensemble, and calculate g^ v as 
described above. 

To generate members of the ensemble defined by eq. [35] 
we use variations on a method introduced in [T] which 
they termed 'compatible Monte-Carlo' 15 . Essentially, a 
Monte-Carlo chain is run with any move which proposes 
a switch to a configuration x' pr for which C n (x' r ) ^ 
C n (x p ) is summarily rejected. Given our mapping, 
C n {x p ) — C n (x Pi r) this rule is easy to enforce. In the 
simplest iteration we can equilibrate using Metropolis 
moves, but only proposing spins which are not in level 
n. We introduce several additional tricks to speed up 
convergence which we now describe. 

Consider the task of generating a random member x' p r 
for a given x p at level 1. Because the spins which are free 
to move only make contact with fixed spins, each one can 



be chosen independently. As such, if we choose each 'free' 
spin according to its heat bath probability then we arrive 
at an uncorrelated member x Pir of the ensemble defined 
by x p . 

This trick can be further exploited to exactly calculate 
the contribution to a metric element at level 1 from a 
level configuration x. In particular, by replacing all of 
the spins that are not in level 1 with their mean field 
values, defined by Si,j(x) — { s i,j} c n (x) ( w hich we can 
calculate in a single step) we can immediately write: 



l,J L i (42) 

As such, it is possible to exactly calculate the level 
one quantities < $ M > < <!>„ > for any microscopic 

configuration x and corresponding checkerboard configu- 
ration C 1 (x). We can write the metric at level 1 as 



l M 

9/J.u = M 2 -M E 



C!(x p ) 



cHx q ) 



Beyond level 1 it becomes necessary to use compati- 
ble Monte-Carlo, but we can still take advantage of the 
independence of the free spins at level 1. In particular, 
spins at all levels n > 1 only interact with spins that are 
already absent at level 1. We continue to leave the spins 
that are free at level 1 (henceforth the red sites, from 
their color on a checkerboard) integrated out. This par- 
tition function is most conveniently written in terms of 
the number of up neighbors, n"^ that each red site has: 



log(z(n"P)) 



\ogZ(C 1 (x))= E 

i,j not in level 1 (^") 

z(n up ) = cosh ((/3 J) (2 - n u P)) 



Additional spins that are not integrated out at level 
n are flipped using a heat bath algorithm with the ra- 
tio of partition functions in an 'up' vs 'down' configura- 
tion used to determine the transition probability. The 
probability of a spin (at level > 2) transitioning to 'up' 
after being proposed from the down state is given by 



up 
3 



l ) with 



In each coarsening step g^—g^ 1 must be a positive semidefinite 
matrix. This is because no parameter combinations can be more 
measurable from a subset of the data available at level n than 
from its entirety. 

Ron, Swendsen and Brandt used this technique for entirely differ- 
ent purposes. They generated large equilibrated ensembles close 
to the critical point, essentially by starting from a small 'coars- 
ened' lattice and iteratively adding layers to generate a large 
ensemble. 



{k,l} n.n. of {t.j} 

<r n = n , <nZ) 

{k,l} n.n. of 

Equilibration is extremely fast as their are effectively 
no correlations larger than the spacing between fixed 
spins at level n. This allows us to generate an ensem- 
ble of lattice configurations at level 1, conditioned on the 
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system coarsening to an arbitrary configuration at an ar- This is used to produce figure 3 for data at level 2 and 
bitrary level n > 1. As such, for efficiency we slightly higher, 
modify equation [36] to 



1 i= M r,«=M' / , . , 

(46) 
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